-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Propagate Python exceptions from c_is_list_like (#33721) #33723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Propagate Python exceptions from c_is_list_like (#33721) #33723
Conversation
@@ -1 +1 @@ | |||
cdef bint c_is_list_like(object, bint) | |||
cdef bint c_is_list_like(object, bint) except -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm don't think a bint can ever return -1. The only thing I think you could except here is NULL but would really have to refactor to do that
Is there a particular issue you are trying to solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing.
The ignored exception led to a segfault in https://github.com/rapidsai/cudf that I'm unable to reproduce at the moment since we've since refactored our code a bit -- apologies.
However, we can demonstrate the issue by raising artificially within c_is_list_like
-- I believe it's valid to assume that Python exceptions can originate from within this function.
- Without the
except -1
:
cdef inline bint c_is_list_like(object obj, bint allow_sets):
raise ZeroDivisionError()
return (
isinstance(obj, abc.Iterable)
# we do not count strings/unicode/bytes as list-like
and not isinstance(obj, (str, bytes))
# exclude zero-dimensional numpy arrays, effectively scalars
and not (util.is_array(obj) and obj.ndim == 0)
# exclude sets if allow_sets is False
and not (allow_sets is False and isinstance(obj, abc.Set))
)
>>> import pandas as pd
>>> pd.api.types.is_list_like([])
ZeroDivisionError
Exception ignored in: 'pandas._libs.lib.c_is_list_like'
ZeroDivisionError:
False
- With the
except -1
:
cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
raise ZeroDivisionError()
return (
isinstance(obj, abc.Iterable)
# we do not count strings/unicode/bytes as list-like
and not isinstance(obj, (str, bytes))
# exclude zero-dimensional numpy arrays, effectively scalars
and not (util.is_array(obj) and obj.ndim == 0)
# exclude sets if allow_sets is False
and not (allow_sets is False and isinstance(obj, abc.Set))
)
>>> import pandas as pd
>>>pd.api.types.is_list_like([])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/lib.pyx", line 985, in pandas._libs.lib.is_list_like
return c_is_list_like(obj, allow_sets)
File "pandas/_libs/lib.pyx", line 989, in pandas._libs.lib.c_is_list_like
raise ZeroDivisionError()
ZeroDivisionError
(-1
seems to be a valid value for bint
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignored exception led to a segfault
Any idea what value was being passed? Its hard to imagine what could cause this to raise unless you specifically rigged it to make isinstance checks fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha - so after digging up some old code, it looks the source of the problem was a RecursionError
, which happened to be thrown within isinstance
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimal repro (segfaults for me):
In [1]: import pandas as pd
In [2]: def foo():
...: pd.api.types.is_list_like("test")
...: foo()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 is appropriate here as it signals cython that there maybe is an exception to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u want to add the test ok or can just push this
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -659,6 +659,8 @@ Other | |||
- Bug in :meth:`Series.map` not raising on invalid ``na_action`` (:issue:`32815`) | |||
- Bug in :meth:`DataFrame.__dir__` caused a segfault when using unicode surrogates in a column name (:issue:`25509`) | |||
- Bug in :meth:`DataFrame.plot.scatter` caused an error when plotting variable marker sizes (:issue:`32904`) | |||
- Propagate Python exceptions from :func:`_lib.c_is_list_like` (#33721) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don’t need the whats new as this is pretty obscure and not user facing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this is the right solution. As is this function would never return -1 so it's a little confusing to declare that a return value of -1 should propogate errors.
Do you know within this function where the segfault actually happens? I think the issue is probably with util.is_array
if anything
May also be an issue that |
Out of skepticism I tried changing the cimport and it didnt change the segfault. |
The fact that the function will never return -1 is exactly why it is used to signal exceptions. Here are the relevant Cython docs: https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#error-return-values This pattern is also used in other places in https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/lib.pyx#L1096 |
In this particular case, a segfault happens because a |
The whatsnew can be removed, then LGTM |
@@ -1 +1 @@ | |||
cdef bint c_is_list_like(object, bint) | |||
cdef bint c_is_list_like(object, bint) except -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 is appropriate here as it signals cython that there maybe is an exception to check.
Misunderstanding on my end - thanks for the link.
Lgtm
…Sent from my iPhone
On Apr 23, 2020, at 10:10 AM, Jeff Reback ***@***.***> wrote:
@jreback requested changes on this pull request.
In pandas/_libs/lib.pxd:
> @@ -1 +1 @@
-cdef bint c_is_list_like(object, bint)
+cdef bint c_is_list_like(object, bint) except -1
-1 is appropriate here as it signals cython that there maybe is an exception to check.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shwina if you can add the test as proposed would be great (the recursion one); ping on green.
Thanks -- will do. Any suggestions for where to put that test? |
pandas/tests/dtypes/test_inference.py |
I'm now having second thoughts about the robustness of this test: it assumes that the Any thoughts here? |
what if got rid of c_is_list_like and just made is_list_like |
That would solve this problem. But it looks like Not sure whether the overhead that will be introduced by calling a |
I think that having a fast |
@shwina can you post the test that you have for now? Will be easier for review and filing comments |
lgtm can u merge master and resolve conflicts and ping in green |
Thanks @jreback - done. |
thanks @shwina |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff